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Abstract 



The present paper provides a comprehensive study of de-noising properties of frames and, in 
particular, tight frames, which constitute one of the most popular tools in contemporary signal 
processing. The objective of the paper is to bridge the existing gap between mathematical and 
statistical theories on one hand and engineering practice on the other and explore how one 
can take advantage of a specific structure of a frame in contrast to an arbitrary collection of 
vectors or an orthonormal basis. For both the general and the tight frames, the paper presents 
a set of practically implementable de-noising techniques which take frame induced correlation 
structures into account. These results are supplemented by an examination of the case when 
the frame is constructed as a collection of orthonormal bases. In particular, recommendations 
are given for aggregation of the estimators at the stage of frame coefficients. The paper is 
concluded by a finite sample simulation study which confirms that taking frame structure and 
£F\ I frame induced correlations into account indeed improves de-noising precision. 
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1 Introduction 



* 

In the recent years, there has been resurgence of interest in de-noising by frames spanning different 
communities. The effort was undertaken by mathematicians working in the area of approximation 
theory, by statistics and computer science communities (the "large-p, small-n" problem and model 
selection), and by engineering community (regularization theory and sparse coding of signals and 
images) . 

The need for overcomplete representations stems from the fact that, though a single orthog- 
onal basis allows very fast computations, it very often fails to efficiently represent a function of 
interest, /, so that one needs a large number of coefficients to transmit or store. In fact, if / is 
expanded over a much more exhaustive dictionary with p elements, it very often can be represented 
with a very few nonzero coefficients. Moreover, one can reduce the error of representation beyond 
what is possible when one orthonormal basis is used. 

Mathematicians and statisticians dealt with this problem for years. However, the methods 
which they designed were intended for an arbitrary dictionary and did not take advantage of their 
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particular structure. For this reason, those methods work very well in a regression- type set up 
when one does not need to obtain results instantaneously. 

One of the most popular groups of methods relies on minimizing the difference between 
the function / and its representation under some set of constraints. From the point of view of 
optimization theory this problem can be re-formulated as the problem of minimization of penalized 
risk of the representation of /. Various choices of penalties and risk functions were suggested 
leading to RIDGE regression (see, e.g., Brown and Zidek (1980), BRIDGE regression (Frank and 
Friedman (1993)), LASSO (Tibshirani (1996)), Dantzig selector (Candes and Tao (2007)), the least 
angle regression (Efron et.al. (2004) and Support Vector regression (Smola and Scholkopf (2004)) 
among others. 

Such methods neither assume nor exploit any specific structure of the dictionary and, as 
a result, are very computationally expensive. For this reason, those methods cannot be used for 
real-time problems and, as a result, are not very popular in practical engineering applications. 

For many years, engineers have been using frames, especially, tight frames due to their simple 
reconstruction properties. However, when it comes to de-noising, engineers routinely treat frames, 
especially tight frames, as if they were orthonormal bases completely ignoring correlations between 
frame vectors and using thresholding methodologies developed for the case of orthonormal bases. 
This sentiment is well expressed in a recent paper of Yu, Mallat and Bacry (2008) which states that 
"a tight frame behaves like a union of a orthogonal bases." For this reason, there is a multitude 
of engineering papers where methodologies designed for orthonormal basis are applied to frames 
without any consideration of frame structure. 

There have been a growing sentiment in the statistics community that discounting corre- 
lations associated with frames reduce de-noising precision. Few authors focus their attention on 
universal threshold which is frequently used by engineers in the context of frames as if all frame 
functions are independent which leads to the threshold which is too large. Among them, Downie 
and Silverman (1998) considered multiwavelets which constitute a particular type of a frame. 
Walker and Chen (2010) studied universal thresholding in the case of Gabor frames with a Black- 
man window. Recently, Haltmeier and Munk (2012) derived the universal threshold for a general 
frame satisfying rather stringent conditions which ensure that the threshold depends on the number 
of frame functions but not on frame structure. 

However, to the best of our knowledge, there have never been a comprehensive study of a 
de-noising properties of frames and, in particular, tight frames, which constitute one of the most 
popular tools in contemporary signal processing. Note that general statistical methods designed 
for correlated noise do not lead to fast computations and are impractical in this set up since the 
covariance matrix is too big. The objective of the present paper is to bridge the existing gap 
between mathematical and statistical theories on one hand and engineering practice on the other 
and explore how one can take advantage of a specific structure of a frame in contrast to an arbitrary 
collection of vectors or an orthonormal basis. In particular, the purpose of this paper is to provide 
a set of practically implementable de-noising techniques which take correlation structure of the 
frame coefficients into account. 

In pursuing this goal, we start with derivation of the oracle (best in the mean square sense) 
linear diagonal shrinkage estimator. It turns out that, by construction, the i-th element of the 
diagonal shrinkage matrix depends not only on the i-th frame coefficient but also on coefficients 
related to it (not necessary in its vicinity). In this sense, the oracle can be regarded as a block 
shrinkage procedure, where the length and the constitution of the block is automatically determined 
by the correlation structure of frame coefficients induced by the frame transform. The oracle is 
followed by derivation of the Stein Unbiased Risk Estimator (SURE) in the case of a general frame 
and an arbitrary de-noising strategy. The SURE formulation is very similar to the one obtained 
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by Blu and Luisier (2007) for the case of interscale image de-noising. Next, we use this result 
for designing particular types of de-noising algorithms (linear shrinkage, soft or hard thresholding, 
etc.) The SURE provides a good assessment tool for de- noising with any kind of a frame and can, in 
fact, be used for construction of frames with specific properties. It also leads to fast computational 
procedures and, at the same time, better de-noising precision since it exploits both the sparsity 
and the correlation structure of the frame. Subsequently, we explore the case of a tight frame and 
show how the techniques suggested for general frames are naturally simplified and speeded up in 
this situation. 

Finally, we consider the case when a tight frame is formed as a collection of orthonormal 
bases. In this situation, hypothetically, one can obtain estimators for each of the orthonormal bases 
separately and then combine them with weights which sum up to unity. Normally, in engineering 
practice, the estimators are just combined with equal weights as it is done, for example, in cycle 
spinning. However, one can use different weights with the objective of obtaining an estimator with 
better risk properties. The set of methods which exploit this idea is called aggregation and was 
studied extensively by statistics community in the last decade (see, for instance, Bunea and Nobel 
(2008), Bunea, Tsybakov and Wegkamp (2007), Gribonval (2003), Guleryuz (2007) , Juditsky and 
Nemirovski (2000), Juditsky, Rigollet, and Tsybakov (2008), Leung and Barron (2006), Wegkamp 
(2003) and Yang (2001)). 

Nevertheless, aggregation techniques have various limitations which make them unsuitable 
for engineering practice. The existing techniques treat estimators as constant (the risk is condi- 
tioned on those estimators) or require sequential constructions of regression estimators and, in both 
cases, lead to expensive computational procedures. In particular, the algorithm of Bunea, Tsy- 
bakov and Wegkamp (2007) involves high-dimensional optimization which is impossible to carry 
out in real-time computations. Leung and Barron (2006), on the other hand, treat each of the 
regression estimators as variable and work out an oracle expression for the risk which allows them 
to offer an explicit choice of weights. However, due to the fact that the estimators are combined at 
the final stage, the authors cannot take full advantage of their approach and are able to combine 
only one type of estimators, the least squares estimators, in particular, the least squares estimators 
based on one basis function each. 

In what follows, we take a more general and flexible approach to the aggregation problem. 
We study the situation when both the risk and the estimators are variable and they are combined 
before reconstruction, at the stage of frame coefficients. In particular, we assume that a tight 
frame is constructed as a collection of orthonormal bases. The frame coefficients are subsequently 
de-noised and, finally, the function is re-constructed using variable weights for each of the bases. 
Using results of the earlier parts of the paper, we derive an oracle expression for the risk which 
is not conditioned on a particular estimation strategy and can take into account any explicit de- 
noising technique. Moreover, unlike in Leung and Barron (2006) and other aggregation papers, we 
derive an expression which contains unknown weights in explicit form, making it easier to carry 
out necessary optimization. Furthermore, our approach allows one to explore both the situation 
of data independent weights (or fixed estimatorsand) data dependent weights. In the former case, 
we validate one of the main reasons for popularity of frames in engineering. Indeed, we show 
that, if the frame is constructed as a combination of orthonormal bases, then the risk of any frame 
estimator obtained as a linear combination of the estimators in each basis is smaller than the linear 
combination of the risks. 

The rest of the paper is organized as follows. Section [2] presents oracle expressions for the 
mean squared risks of the diagonal shrinkage and thresholding estimators in the case of general or 
tight frames. Section [3] provides SURE rules for those estimators. Results obtained in Sections [2] 
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and [3] are used in Section 0] for designing optimal thresholding or shrinkage algorithms. Section [5] 
treats the case when the frame is constructed as a collection of orthonormal bases, in particular, it 
gives recommendation how the estimators can be aggregated at the frame coefficient stage, before 
reconstruction. Section [6] studies performances of the methodologies developed in the paper via 
numerical simulations carried out on test and real signals. Section [7] concludes the paper with the 
discussion. Finally Section [8] contains the proofs of the statements presented in the paper. 

2 Oracle expression for the risk for general or tight frames 

A collection of functions {wi} form a frame in a separable Hilbert space H if there exist two 
positive frame bounds C\ and C u > such that, for any / G H, 

Cz||/|| 2 <^|(/,^)| 2 <a||/|| 2 . (2.1) 

i 

As particular cases of frames one can list Gabor frames, in which set {wi} comprises translated 
and modulated versions of the same function, short time (or windowed) Fourier transform and 
wavelet frames. 

In the space of discrete signals of length n, one usually considers N vectors Wi G C n , 
i = 1, • • • , JV, which together form matrix W G C JVxn In these notations, (12. ip implies that W is 
a matrix of a frame operator if for any / G L2(R n ) one has 

Ci\\f\\ 2 <f*W*Wf <C u \\ff (2.2) 

where W* is a transpose conjugate of W. The latter guarantees that eigenvalues of matrix V = 
W*W are bounded above and below and, therefore, V is invertible. 

If frame bounds are equal to each other, C\ = C u = a, then the frame is called tight and a 
is referred to as a frame constant. In the case of a tight frame the generalized Parseval's identity 
holds and W*W is proportional to the identity matrix. In what follows, we shall assume that if the 
frame is tight, then W*W = al n . However, a tight frame can also be normalized so that a = 1, 
as it is done for the Gabor frame which is used for simulations in Section [6l 

Consider a problem of recovering vector / G R n from its noisy observation 

x = f + S, 5 ~ N(0,o- 2 I n ). (2.3) 

Applying frame transform W to both sides of equation (|2.3f) . obtain 

y = e + e, e~N(0,a 2 U) (2.4) 

where y = Wx, 9 = Wf, e = W5 and U = WW* G C NxN . The goal of the analysis is to 
reduce noise in the vector of frame coefficients y by shrinking or thresholding its components, 
thus, obtaining vector 9 and, subsequently, to estimate / by 

/ = V~ X W*9 = W + 9, (2.5) 

where W + = (W*W)~ l W* is the MoorePenrose inverse of matrix W. 

We assume that the vector of frame coefficients 9 is estimated by 9 = Ty where T = 
diag(7i,--- ,7tv) is a fixed diagonal matrix in [0, l] ArxAr . The next statement provides an ora- 
cle expression for the risk of this estimator. 



4 



Theorem 1 If = Ty where T is a fixed diagonal matrix, then 

E\\f - /|| 2 = Tv[U-(I N - T)99*(I N - T) + <J 2 TUTU-}. (2.6) 

If the frame is tight, the previous expression takes the form 

e||/ - /|| 2 = «- 2 Tv[u{i N - r)ee*{i N - r) + o- 2 mm]. (2.7) 

#ere, £7 = WW* and U~ = {W + )*W + . 

Proofs of this and later statements are given in Section [H 

Note that expressions (12.61) and (|2.7h require simple minimization of quadratic forms due to 
the following identity 

argmin {Tt{U-(I n - T)98*(I N -T) + a 2 FUTU-}} = argmin {7M7 - 2j*b\ (2.8) 

r=diag(7) 7 J 

where A = (00*)oU~ +a 2 '(U oU~) , b = ((88*)oU~~)e]\i, e^ is the vertical vector with all components 
equal to one and o denotes the Hadamard (element-wise) matrix product. According to identity 
(|2.8p . the optimal gain vector 7 = diag(r) can be presented as 

7 = (00* o U~ + <7 2 £/ o [/-) _1 o U~) e N 

and, in the case of tight frame, it takes the form 

7 = (00* oU + a 2 U o U) _1 (^* o C7) e N . (2.9) 

It is worth noting that, by construction, the weights ji in the best linear diagonal estimator 
are functions not only of 9{ but also of other coefficients in its neighborhood. In this sense, the 
best linear diagonal estimator is no longer diagonal and represents an overlapping block shrinkage 
procedure where the length of the block is automatically determined by the correlations induced 
by the frame operator. Moreover, observe that matrix A is invertible since the Hadamard product 
of two positive-definite matrices is positive-definite. Moreover, matrix U~ usually has a block 
structure, so that the inversion of A could be carried out by fast algorithms specifically designed 
for this case. 

According to Theorem [U for hard thresholding one needs to minimize risk (12 .6p or (12 .7p over 
the set of arbitrary diagonal matrices with zero or unit values. Observe that in the case of an 
orthonormal basis, the oracle (I2.7P takes a familiar form 

n 

nf-f\\Lr d = £)[(? f 2 I(7i = 0) 2 + ^1(74 = 1)] 
i=l 

and motivates one to keep larger coefficients and discard smaller ones irrespective of the particular 
value of matrix W. The situation changes when matrix W ceases to be unitary. Indeed, TheoremQ] 
implies that the choice of coefficients to "keep" or "kill" depends not only on their values but also 
on the entries of matrix U. 
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3 SURE rules for general or tight frames 

The advantage of the oracle expressions is that they allow to construct unbiased estimators for the 
risk. Indeed, matrix = 68* can be written as = K(yy*) — a 2 U and estimated by = yy* — a 2 U . 
The latter leads to the following unbiased estimator for the risk: 

Corollary 1 If 6 = Ty where F is a fixed diagonal matrix, then 

E||/-/|| 2 = a 2 n + EA (3.1) 

where 

A = y*{I N - T)U-(I N - T)y - 2a 2 Tr [U~U (I N - F)]. (3.2) 
In particular, ifF induces a hard thresholding rule, i.e. Ji = 1 or 0, then 

N 

A = E [y^ U iJ ~ 2<J 2 {U-U) u I(i = j)] %i = 0)I( 7i = 0). (3.3) 

Since matrix is non-negative definite, all its diagonal elements should be non-negative 
which leads to the relations 

O u = y 2 - a 2 U u > 0. 

These inequalities themselves enforce hard thresholds oyJXJa on the values of yi. The oracle ex- 
pression (|2.7p allows for further reduction of the risk. 

The oracles (|2.6[) and (|2.T|) , though, are of limited value since they do not allow one to access 
risk of more sophisticates rules where matrix F itself depends on y. In this case, one can write 6 
as 

e = y + g{y). (3.4) 
Then, using modification of SURE, one obtains the following result: 

Theorem 2 Let the data follow model \2. 3\) and y be of the form \2.J$ . Let f be given by formula 
(KM) with 9 of the form {3y ) where g{y) : R N -> R N is a continuous and piecewise differentiable 
column vector function. Let Z = V y g*(y) be an N x N -dimensional matrix with components 

^ = ^7 (!/)]■ (3-5) 



Then, the mean quadratic risk is given by expression \3. 1\) with 

A = g*(y)U-g(y) + 2a 2 Tv[U-UZ}. (3.6) 

In the frame is tight, then U~~ = a~ 2 U and 

A = a' 2 g*(y)Ug(y) + 2<T 2 aT 1 Tr[C7 Z}. (3.7) 

Note that Theorem [2] allows one to obtain explicit expressions for various type of thresholding 
or shrinkage procedures, as well as to construct unbiased estimators of the risk of those procedures. 
If one uses linear shrinkage F, then g(y) = (F — In)u and Z = F — In, so that Theorem [2] recovers 
expression (|3.2h for A. 

In the case of soft thresholding with variable threshold ij, one has Q% = (yi — sgn(yi)ti)I(\yi\ > 
U), so that gi{y) is of the form 

9i(y) = -sgn(yj)min(|yi|,ii), « = !,-••, N. (3.8) 
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Hence, Z is a diagonal matrix with elements 

Za = < *i) (3.9) 

and the following corollary is valid. 

Corollary 2 If g(y) is defined by A3.8\) , then the risk is of the form \3. 1\) with 

N 

A=Y, [sgri(yi%Omin(| 2 / i |,t i )min(| % |,t i )^ - 2a 2 {U~U) tt l{i = j)l{\y t \ < U)] ■ (3.10) 
if i/te frame is tight, the previous expression simplifies to 

N 

A = [ a 2 s S n (yiVj) ^^(lyil^i) m ^(\yj\,tj)Uij - 2a 2 a~ 1 U ii I(i = j) l{\yi\ < U)] . (3.11) 

It is easy to check that, in the case of an orthonormal basis, familiar expressions for the risks 
can be easily recovered from formula (|3.1ip . Indeed, setting n = N, a = 1 and U = I n , as before, 
one obtains: 



nf-ftsoft = cr 2 n + E 



^min(y i 2 , t 2 ) _ 2fJ 2^ I( | y4 | <ti) 
,i=l i=l 



4 Designing optimal thresholding or shrinkage algorithms 

One can use expressions derived in the previous section to design an optimal shrinkage or thresh- 
olding strategy. In what follows, just to be specific, we consider the case of linear shrinkage and 
hard thresholding only. Other shrinkage or thresholding techniques including soft thresholding 
can be analyzed in a similar manner. Note that since matrices U, U~ and U~U are not data- 
dependent, they may be calculated in advance and, thus, the main computational complexity lies 
in solving the resulting optimization problems. 

4.1 Linear shrinkage 

Recall that the risk is of the form fllTT]) where A = Tr[U~(I N - T)yy*(I N - T) + 2o 2 U-JJT - 
2a 2 U~U]. Since the last term of the expression for A is independent of T, one needs to minimize 

F(T) = Tt[U-(I n - r)yy*(I N -T) + 2<j 2 U~UT). 

Direct calculations show that this minimization takes the form of a quadratic programming prob- 
lem. Indeed, if one define matrix A = (yy*)oU~ and vectors 7 = diag(r) and b = (A—a 2 UoU~)e]\r, 
then 

F(T) = F( 7 ) = 7M7 - 2 7 *6, (4.1) 
and the optimal 7 E [0,1] which minimizes -F( 7 ) takes the form 

7 = ((yy*) o U-)- 1 {(yy*) o U~ - a 2 U o U~) e N . (4.2) 

Since matrices U, U~ and yy* are nonnegative definite and Hermitian, matrix A is also nonnegative 
definite and Hermitian, and, thus, the quadratic programming problem is convex. Furthermore, 



7 



note that matrix A and vector b are sparse. For example, in the case of a tight frame, expressions 
for A and b take the forms A = a~ 2 (yy*)o U and = (A — a 2 Uo U)e^. Since the majority of entries 
of matrix U are equal to zero, respective entries of matrix A also vanish. 

The optimization problem (|4.ip can also be modified by adding a penalty term pen(7) to 
-F(7). In particular, one can use a quadratic penalty term j*Pj with the positive definite matrix 
P or an £p penalty of the form pen(^) — /^IItII^p where || • \\i p is a vector norm in £p space, which 
induces sparsity whenever < p < 1. 



4.2 Hard thresholding 

In the case of hard thresholding, the SURE is of the form (|3.1j) with A given by expression (|3.3p . 
In order to minimize expression for A in (I3.3p . introduce matrix H with components 



rr _ J ViVjUij if * ^ 3, 

13 I ytUr-2a 2 (U-U) u if i=j 

Consider a set of indices J such that j E J if 7j = and j J otherwise. Then A can be re-written 

as 

A = E H *i 

i,jeJ 

and the goal is to find a set of indices J such that the sum of respective row and column elements 
of matrix H is minimal. This minimizations can be accomplished by a kind of a greedy algorithm 
which can be carried out as follows. 



Greedy algorithm 

1. Since diagonal values of matrix H are counted once while all other elements are counted 
twice, introduce modified matrix H with elements 



Hij, if i^j, 
Hij/2, if i = j 



Set J = {!,■■■ ,N}. 

2. Find a column I of H with the maximum sum of elements. 

3. If the sum of elements of column / is positive, then eliminate column / and row I from H 
and index / from set J, and RETURN TO STEP 2. If the sum of elements of column / is zero or 
negative, then FINISH. 

4. Set 7j = if j G J and jj = 1 if j J. 



5 Frame constructed as a collection of orthonormal bases 

Consider the case when a frame is constructed as a collection of m orthonormal bases. In this case, 
N = nm and matrix W has a block structure with m vertical blocks W" S Qnxn^ ^ _ 
such that WW(WW)* = (WW)*W"W = /„. Denote l/W) = W^wCi))*. Then, matrix U is a 
block matrix with blocks i, j = 1, • • • , m, and [Z^'^ = /„. It is easy to see that W constitutes 

a tight frame with a = m. 

An interesting phenomenon for the frame of this type is that, since each of matrices 
allows complete reconstruction of /, one can combine those reconstructions with non-equal weights. 
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Let A be a block-diagonal matrix with blocks A^'^ = Xil n , i = 1, • • ■ , to, where weights Aj sum to 
unity: 

m 

-i=i 

Note that, under condition (|5.ip . one has 



W*AW = J2( w{l) )* A i w(l) = Y, X i( wii) T w(i) = T n- 

i=l i=l 

Therefore, if 9 = Wf, then / can be reconstructed as / = W*A9 and estimated by 

/ = W*A9. (5.2) 



Usually, in the current engineering practices, weights are chosen to be equal (as in, e.g., cycle 
spinning), however, this choice does not allow one to reduce the total risk by assigning a smaller 
weight to an estimator with a higher risk. 

It is easy to see that the problem of choosing weights in this set up is ultimately related 
with aggregation problem studied in, for instance, Bunea and Nobel (2008), Bunea, Tsybakov and 
Wegkamp (2007), Gribonval (2003), Guleryuz (2007) , Juditsky and Nemirovski (2000), Juditsky, 
Rigollet, and Tsybakov (2008), Leung and Barron (2006), Wegkamp (2003) and Yang (2001) among 
others. Indeed, note that one can consider an estimator of / of the form /W = (TflA*) )*$(*) where 
§( l ) = yW _|_ giW(yW) is an estimator of 0® = W^'f, the coefficients of representation of / in the 
i-th basis. Then, estimator / in (|5.2p can be re-written as 

m 

/=5>/ (<) - (5-3) 
i=l 

The difference between our approach and aggregation techniques, however, is that we carry out 
aggregation at the level of frame coefficients, not estimators of / themselves. This will allow us to 
avoid, if desired, both conditioning on estimation technique ("constant estimators") and treating 
weights as data- independent. 

Nevertheless, we shall start with the case of data-independent weights and then investigate 
a more elaborate case where weights are data-dependent. 



5.1 Data independent weights: better than the best basis 

If the weights are data independent, then direct calculations show that 



E[|/ - /II 2 = E x ^p» with to = E ^ (i) - o®)*uM(§® - e®)]. 



(5.4) 



The statement below shows that the error of / is always smaller than the weighted sum of the 
errors of estimators 

Theorem 3 If f is defined in and weights Aj are data independent and satisfy condition 

115. then 



n\f - ft = E A * E [n/ w - /ii 2 - n/ w - /ii 1 



(5.5) 



i=l 
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so that if, furthermore, the weights are nonnegative, 

m 

E||/-/|| 2 <^A i E||/»-/|| 2 . 

i=i 

Theorem [3] does not allow one to choose optimal weights, since weights enter expression (|5.5p 
implicitly in the form of /. In order to evaluate expression for the risk in the case of constant 
weights, one needs to combine expression 

U53D and formula flO} with g®(y®), and Z<«') 

instead of g(y), U and Z, respectively: 

m 

E||/ - /|| 2 = ^ AjAj |(7 2 Tr[C/^C/^] +E[( ff ^)*C/^ ff W] + 2CT 2 Tr[tf^Z^V^]} , 

where for the sake of brevity, we denoted g^\y^) = g®. Taking into account that U^ 1 ^ = I n and 
XJ\h3)xj(j,i) = J ni one arrives at the risk of the form with 

m m 

A = ^ \i\j(jg®)*uMg® + 2a 2 ^2XiTr[Z^'% (5.6) 

i,j=X i=l 

The above expression contains weights in explicit form and allows to choose optimal weights for 
any kind of a shrinkage or thresholding technique. Observe that the choice of a "best" basis 
corresponds to one of the coefficients \ being one and the others being zero. This would be an 
optimal choice if matrix p with entries pij defined in (|5.4p were diagonal. However, since this is 
not the case, the choice of only one estimator versus a mixture may not be the best strategy, both, 
from the point of view of risk and even sparsity (see, e.g., Elad and Yavneh (2009)). 

5.2 Data dependent weights 

In order to study the case of data-dependent weights, recall that y € R N is a vector with m block- 
components i/W = 9 W+eW, where eW ~iV(0,CT 2 / n ) and 0® is estimated by §® = +g®(y®). 
Introduce data dependent weights Xi{y) such that relation (|5.ip is valid for any value of y. 

Here, we ought to point out two essential features of our choice of weights. First, we explicitly 
choose weights depending on frame coefficients y rather than raw data x. Second, weights for each 
basis depend on all frame coefficients. Re-writing (I5.2p . we obtain 

m m 

f = E A *(y) (w^ye^ = Y,(w {l) y {\i(y)[y® +s w (y (i) )]} • 

i=l i=l 

Note that / in the last expression can be presented as / = m~ 1 W*9, the tight frame reconstruction 
of the estimator 9 = 9(y) = y + g(y) of frame coefficients. Here g(y) is a block vector with blocks 

g^{y) = m\i(y)\y® + g®(y®)] - y (l) ■ (5.7) 

Hence, we can use expression (|3.7p in Theorem [2] with a = m and g and Z instead of g and Z, 
respectively. 
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Theorem 4 If f is defined in \5. ty) and weights Aj = \i(y) are data-dependent and satisfy condi- 
tion 115.1]) for every y, then the risk is of the form h3.1\) with 

m m 

A = X l (y)X J ( y )(g^yU^g^ + 2a 2 Y,Hy)^(Z {l ' l) ) + A , (5.8) 

i,j=X i=l 
m 

A = 2a 2 J2^T uii ' j) [ V yMy) ■ 
Here, same as before, 6®(y®) 

= + 9^ (y®)> an d, for the sake of brevity, we omitted (y®) in 
the expressions g^\y^) and 0®{y®). 



Straightforward comparison shows that the first two terms in (|5.8p coincide with the respec- 
tive terms in (15. 6h while the last term vanishes when the weights are data independent. Expression 
(|5.8p contains weights explicitly, so, hypothetically, it can be used for choosing data dependent 
weights. 

The difficulty with using formula (|5.8p . however, lies in the fact that one would like to choose 
weights depending not on frame coefficients y but rather on the risk of the i-th estimator 9^ l \y^), 
or, more precisely, on the Stein unbiased estimator of this risk. For this reason, one needs to learn 
how to find partial derivatives of the unbiased estimator of the risk, which is accomplished by the 
following statement. 

Lemma 1 Let the data follow model y = 9 + e where y,6,e £ R n and e ~ iV(0, a 2 I n ). Let 9 be an 
estimator of 9 of the form 9{y) = y + g(y)- Then for the Stein unbiased estimator 

r(y) = a 2 n + g*(y)g(y) + 2a 2 Tr [V y g*(y)] 

of the risk K\\9(y) — 9\\ 2 one has 

V y r(y) = 2[V y g* (y)]g(y) + 2a 2 d(y) (5.9) 

where d{y) is a column vector with components 

*W = g^' * = V".* (5.10) 

Recall that in the case of fixed linear shrinkage gi(y) = (T[ — l)yi and for soft thresholding 
9l(y) = — sgn(y^) mm(yi,t), where t is the threshold, one has d(y) = 0. Therefore, in those two 
cases, V y r(y) = 2[V y g*(y)]g(y). 

Expression (|5.8p can be potentially used in order to minimize A with respect to weights. 
However, this expression is too general to use. Hence, following Leung and Barron (2006), we 
consider weights in the exponential form. 

5.3 Weights in the exponential form 

Let the weights be of the form 

\(\ Tr^exp [-/%(?/«)] 

where iXi > 0, % = 1, • • • , m. Presentation (I5.11|) guarantees that the weights \i(y) sum to unity. 
Usually, the most intuitive choice is rji = ri(yW), the SURE of the ith estimator 9^(y^). 

The following corollary of Theorem U] provides an explicit expression for the unbiased esti- 
mator of the risk for the weights in the form (|5.11|) . 
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Corollary 3 If f is defined in \5.%fy and weights are in the form A5.11\) , then the risk is of the 
form h3. 1\) with A given by formula 115. 8\) and 



m 



(5.12) 



i=i 

m 



2a 2 (3 \ Hy)Hv) [Vy(J)Vj(v U) )\ UM§M - J>(y) V wW 7w(y») §M [> . (5.13) 



i=l 



#ere ; as before, = yW + grW. 

Note that representation (|5.12p of the risk is more compact but does not contain the weights 
explicitly, while formula (|5.13p is more convenient if one wants to minimize the risk with respect 
to TTj, i = 1, • • • , m, or /3. 

If rji(y^) = riiyV"'), i = 1, • • • ,m, where ri(yW) i s the unbiased estimator of the risk 



r,(y«)=a 2 n+[ 5 «(y«)]V j) (y (i) ) + 2 f T 2 Tr 
of the estimator /W, then, by Lemma [lj one has 



V, w b w (y (l) )]* 



V v(i) r < (y«)=2 



v, w b (i) (y (i) )]*l 5 (i) (y w ) + 2(T 2 d i (y« 



where di(y^) is a column vector with components 



1=1 d v\ d vl 



k = 1, • • • , n. 



In particular, if one uses linear shrinkage or thresholding, soft or hard, then di(y^) = 0. 



6 Simulation Study 

In this section, we carry out some numerical experiments to study the finite sample performances 
of the proposed estimators. It is well known that the choice of a frame is linked to the underlying 
function / to be de-noised. The advantage of using a frame compared to an orthogonal basis is that 
it can provide an efficient representation of a broad class of signals as well as better adaptivity for 
their parsimonious representation. In our simulation study, we use the classical Gabor frame with 
Hamming window. This is a tight frame which is particularly suitable for representation of fast 
oscillating signals such as audio signals. For that reason, we consider two fast oscillating standard 
test signals, WernerSorrows and Mishmash, reproducible by MakeSignal of the toolbox Wavelab, 
and two pieces of real audio signals sp2-5k.wav and Glock.wav. The test signals listed above are 
displayed in Figure [TJ 

The objective of this simulation study is to illustrate the gain in de-noising precision obtained 
by taking into the account the frame structure rather than be an exhaustive study of signal de- 
noising by frames. Results of all comparisons are represented in terms of the means and the 
standard deviations of the L2 errors. In order to show the advantage attained by accounting 
for the frame structure, we compare the ideal best diagonal estimator obtained minimizing the 
true risk in (|2.9p versus the ideal best diagonal estimator obtained by minimizing the true risk 
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WernerSorrows MishMash 




-4 1 1 1 1 1 J -4 1 1 1 1 1 1 J 

200 400 600 800 10001200 200 400 600 800 10001200 



S P 2 5k Glock 




200 400 600 800 10001200 200 400 600 800 10001200 



Figure 1: Normalized test signals of length 1280. 

without taking into account the frame structure i.e., considering U = I. We denote these two 
estimators IDEALjj and IDEAL], respectively. Note that estimators IDEALjj and IDEAL] 
are not available in practice, but their comparison can give an idea of the best possible gain 
obtained by taking into account the frame structure. The empirical versions of these estimators 
are derived by substituting 99* with its unbiased estimator yy* — a 2 U and yy* — a 2 In, respectively. 
In the first case, we obtain 

7 = {yy T o U)' 1 {(yy T - a 2 U) o U) e N 
which coincides with solution (14. 2j) in the case U~ = U, while, in the second case, we obtain 

7 = {yy T In) 1 {(yy T - ° 2 In) ° In) e N 

which component- wise reduces to the well known empirical Wiener filter 7, = (yf — a 2 )/yf. In 
what follows, we refer to these estimators as EMPjj and EMPj, respectively. Since matrices 
[yy T U) and especially (yy T o I^A sometimes have high condition numbers, in order to stabilize 
their inversion in our simulation study, we add a quadratic penalization term 7*^7 to the functional 
with matrix P = £ In- 

Results for £ = 10 -4 ' 5 are reported in Table [1] and are based on 100 simulational runs with 
signal-to-noise ratios (SNR) 1, 3 and 5, which represent, respectively, severe, moderate and low 
noise levels. As it is standard in the statistical literature, the signal-to-noise ratio (SNR) is defined 
here as the ratio of the standard deviations of the signal and the noise. The empirical estimators 
EMPjj and EMPi approximate the corresponding ideal estimators IDEALjj and IDEAL] when 
the noise level is low (SNR=5) and may be quite far from them when the noise level is high 
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(SNR=1). However, for all the test signals, the ideal gain (the difference between the first and the 
second columns) and the empirical gain (the difference between the third and the fourth columns) 
obtained by accounting for the frame structure is quite significant, especially, in the case of severe 
noise. 

In a similar manner, we carry out comparisons between soft thresholding procedures ob- 
tained with and without consideration of the specific frame structure. In particular, we construct 
estimators SOFTjj and SOFTi which are obtained by the formula 0, = {yi — sgn(yj)i)I(|yj| > t) 
with the global threshold t obtained by minimizing, respectively, expression (|3.1ip when ti = t for 
all i = 1, • • • ,N, and 



which is the classical expression of the SURE reported in Donoho and Jonstone (1995). Similarly, 
we compare estimators VISUu and VISUj obtained using hard thresholding procedure 9. L = 
yil{\yi\ > t) where, in the first case, the expression for the universal threshold is provided in 
Haltmeier and Munk (2012) 



bases t = cr a/2 log iV. 

Results of comparisons are reported in Tableland are based on 100 simulation runs. It is 
easy to notice that the gain obtained by taking frame structure into account is much more signif- 
icant in the case of SURE than for both VISUu and VISUi universal thresholding procedures. 
This is due to the fact that universal threshold is known to be too large for de-noising applications, 
as it has been already noted in statistical literature (see, e.g., Donoho and Johnstone (1995)). In 
fact, SURE-based soft thresholding procedures outperform the universal thresholding procedures 
even if the former does not take the fame structure into account: it follows from Table [2] that 
SOFTj has better precision than VISUu for every test signal and every noise level. 

In order to examine the performance of the estimator proposed in Section 5, we study the 
simple case of data independent weights. In particular, we consider two classical orthonormal 
bases, Cosine and Haar, and three test functions, Window, LoSine and a combination of the 
two (see Figure [2]). The Window and the LoSine are classical test signals which are very well 
represented, respectively, by the Haar and the Cosine bases. We evaluate estimator (15. 3j) . where 
A is derived by minimizing expression dUl]) and /W's are obtained as soft thresholding estimators 
with the universal data-independent threshold. The risks of the estimators are presented in the 
forth column of Table [3] and the mean values of the estimated weights A are displayed in the fifth 
column. Table [3] also reports the risks of the single estimators (columns one and two) and of the 
average of the estimators obtained with the weights Ai = A2 = 0.5. Note that the aggregation 
estimator is always better or at least as good as the best basis estimator and it is always better then 
the estimator obtained by simple average (i.e., by the default frame reconstruction). Moreover, 
it is instructional to observe that the choice of weights A« supplied by criterion (|5.6p follows an 
intuitive preference. Indeed, one would favor Cosine basis for LoSine signal, Haar basis for 
Window signal as well as a balanced combination of the two bases for the sum of these two 
signals: computations confirm those intuitive assessments. 




(6.1) 




With Z = 7T 



/ \/6, and, in the second case, t is the classical universal threshold for the orthonormal 
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Table 1: Results obtained over 100 runs and with parameter choices, n = 1280 and 64-sampled 
Hamming window. 





IDEALu 


IDEALi 


EMPu 


EMP[ 


WernerSorrows 
SNR=1 




n 9974 (a ni 1 fi*\ 


4QR4 (C\ 0940*1 


^ 7490 (C\ 1 847 s ! 


SNR=3 


0.0284 (0.0019) 


0.0404 (0.0022) 


0.0777 (0.0034) 


0.1343 (0.0049) 


SNR=5 


0.0126 (0.0006) 


0.0167 (0.0007 ) 


0.0321 (0.0011 ) 


0.0412 (0.0015) 


MishMash 
SNR=1 


0.1026 (0.0106) 


0.1837 (0.0139) 


0.4881 (0.0254) 


6.2411 ( 0.2290) 


SNR=3 


0.0211 (0.0017) 


0.0284 (0.0021) 


0.0752 (0.0036) 


0.1113 (0.0044) 


SNR=5 


0.0094 (0.0007) 


0.0122 (0.0008) 


0.0324 (0.0013) 


0.0286 (0.0015) 


sp2-5k 
SNR=1 


0.1533(0.0112) 


0.2474(0.0127) 


0.5201 (0.0254) 


6.2648(0.1745) 


SNR=3 


0.0363 (0.0022) 


0.0548 (0.0024) 


0.0849 (0.0039) 


0.1771 (0.0058) 


SNR=5 


0.0168 (0.0009) 


0.0244 (0.0011) 


0.0349 (0.0014) 


0.0614 (0.0022) 


Clock 
SNR=1 


0.0845(0.0075) 


0.1305 (0.0093) 


0.4529 (0.0245) 


6.4889 (0.2079) 


SNR=3 


0.0192 (0.0014) 


0.0278 (0.0016) 


0.0737 (0.0037) 


0.1232 (0.0043) 


SNR=5 


0.0089 (0.0006) 


0.0123 (0.0007) 


0.0322 (0.0013) 


0.0326 (0.0015) 



0.1 r 

: 

-0.1 - 


0.1 r 

•I 

-0.1 - 


0.1 r 

-0.1 - 


Figure 2: Normalized test signals of length 1024. 
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Table 2: Results obtained over 100 runs and with parameter choices, n = 1280 and 64-sampled 
Hamming window. 





SOFTu 


SOFT, 


VISUjj 


VISUi 


WernerS orrows 
SNR=1 


0.3748 (0.0188) 


0.8511 (0.0414) 


0.8987 (0.0199) 


0.9024 (0.0200) 


SNR=3 


0.0763 ( 0.0041) 


0.1342 (0.0114) 


0.3748 (0.3748) 


0.3965 (0.3965) 


SNR=5 


0.0327 (0.0016) 


0.0481 (0.0039) 


0.1230 (0.0041) 


0.1275 (0.0041) 


MishMash 
SNR=1 


0.3519 (0.0216) 


0.8970 (0.0695) 


0.9733 (0.0173) 


0.9756 (0.0158) 


SNR=3 


0.0602 (0.0040) 


0.1063 (0.0095) 


0.2434 (0.0148) 


0.2573 (0.0160) 


Cfjn — c 
OIN in — 


U.UZOl ^U.UUlO J 


n Ozii a (c\ nn^^ 

U.U^l^ \V.yJ\Joo) 




U.UIoD yj.yjKJou) 


sp2-5k 
SNR=1 


0.3893 (0.0197) 


0.8555(0.0693) 


0.9934 (0.0120) 


0.9952 (0.0105) 


SNR=3 


0.0917 (0.0047) 


0.1689 (0.0164) 


0.3881 (0.0143) 


0.4023 (0.0142) 


SNR=5 


0.0457 (0.0026) 


0.0626 (0.0059) 


0.1740 (0.0052) 


0.1799 (0.0050) 


Clock 
SNR=1 


0.2853 (0.0186) 


0.5064 (0.0462) 


0.9181 (0.0453) 


0.9350 (0.0462) 


SNR=3 


0.0516 (0.0029) 


0.0981 (0.0083) 


0.1591 (0.0076) 


0.1628 (0.0078) 


SNR=5 


0.0228 (0.0012) 


0.0406 (0.0034) 


0.0898 (0.0026) 


0.0919 ( 0.0026) 



Table 3: Results obtained over 100 runs by Cosine and Haar bases. Parameter choices are 
n = 2 10 and J = 3 for the Haar basis. 



function 


SNR 


Cosine 


Haar 


averaging 


aggregation 


(Ai A 2 ) 


Window 


1 


0.2291 


0.1719 


0.1719 


0.1648 


(0.2200 0.7800) 




3 


0.0742 


0.0214 


0.0364 


0.0214 


(0.0105 0.9895) 




5 


0.0444 


0.0076 


0.0182 


0.0077 


(0.0045 0.9955) 


LoSine 


1 


0.1284 


0.9940 


0.4118 


0.1284 


(1.0000 0.0000) 




3 


0.0427 


0.6682 


0.2221 


0.0444 


(0.9836 0.0164) 




5 


0.0259 


0.3617 


0.1177 


0.0305 


(0.9181 0.0819) 


Window + LoSine 


1 


0.2644 


0.3496 


0.2673 


0.2563 


(0.7693 0.2307) 




3 


0.0855 


0.2011 


0.1037 


0.0827 


(0.8650 0.1350) 




5 


0.0509 


0.1666 


0.0720 


0.0496 


(0.8448 0.1552) 
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7 Discussion 



The present paper provides a comprehensive study of de-noising properties of frames and, in 
particular, tight frames, which constitute one of the most popular tools in contemporary signal 
processing. The objective of the paper is to bridge the existing gap between mathematical and 
statistical theories on one hand and engineering practice on the other and explore how one can 
take advantage of a specific structure of a frame in contrast to an arbitrary collection of vectors 
or an orthonormal basis. 

For both the general and the tight frames, the paper presents a set of practically imple- 
mentable de- noising techniques which take frame induced correlation structures into account. 
These results are supplemented by an examination of the case when the frame is constructed 
as a collection of orthonormal bases. In particular, recommendations are given for aggregation 
of the estimators at the stage of frame coefficients. The paper is concluded by a finite sample 
simulation study which confirms that taking frame structure and frame induced correlations into 
account indeed improves de-noising precision. 
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8 Appendix 

Proof of Theorem Q3 To verify expression (|2.6p . note that 

E||/ - /|| 2 = Tr[^+(r y - 9){Yy - 9)*(W + )*] = A 1 + A 2 

where 

Ai = Tv[W + TE(ee*)T*(W + )*] = a 2 Tr[TUTU-} 

and 

a 2 = TT[w + (i N -r)e9*(i N -r)(w + )*} = Ti[u-(i N -r)ee*(i N -r)}, 

which completes the proof of ()2.6|) . To prove (j2.7|) . note that in the case of a tight frame, one has 
U~U = a~ 2 U 2 = a~ 2 WW*WW* = a~ l U since W*W = ai. 

Proof of Corollary Q3 Note that 

Tr[rUTU~] = Tr[UU~ + (I N - T)U(I N - T)U~ - 2(I N - T)UU~] 
Now, to prove (|3.2p . replace 66* by yy* — a 2 U in (|2.6|) and observe that 

Tr[UU~] = Tr[WW*(W + )*W + ] = Ti[(W + W)* W + W] = n (8.1) 
since W + W = I n . In order to obtain (|3.2p . replace U~ with a~ 2 U. 

Proof of Theorem [2j First, let us show that under conditions of Theorem [2j one has 

E [((9 -9)0- 6)*} = a 2 U + E[g(y)g* (y)} + 2a 2 UE[Z]. (8.2) 
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To this end, note that 

E [<9 -9)0- e)*} = E[(y - 9)(y - 6)* + g{y)g*{y)\ + 2E[(y - 9)g*(y)} = fii + 2Q 2 . 

Here S7i = a 2 U + E[g(y)g*(y)] and, due to representations y = Wx and 6 = W/, the expression 
for Q 2 may be written as f2 2 = W E[(sc — f)g*(Wx)]. 

Denote C ff = (27ra 2 )" n / 2 and observe that Q = E[(x - f)g*(Wx)} is the n x N matrix with 
components 

Qij = E[(x l -f l )g j (Wx)] = C a J ••• J(x l -f l )g J (Wx)exp{-\\x-f\\ 2 /2a 2 )dx 

= —C a a 2 j "J 9j(Wx) d/(exp (—0.5 o~ 2 \\x — f\\ 2 ))dxi ■ ■ ■ dxi-\dxi + \ ■ ■ ■ dx n 



C a a 2 J---J ^-[ gj {Wx)\ ex V {-\\x-f\\ 2 /2a 2 )dx = a 2 E 



d_ 

dxi 



9j (Wx) 



In the expression above, we denoted differential with respect to X[ by di and used integration by 
parts. 

Applying the chain rule, derive that 



r) N 8 N 

[gj(Wx)] = o-bi(y)]w a = ^ z tj w u = (w*z) lr 
i=i ° Vi 



dxi 



8=1 



Therefore, 



n 2 = a 2 E(WW*Z) = a 2 U E(Z), 



which yield expression (18. 2p . 

Now, to complete the proof of (|3.6p . observe that 



EII/-/II 5 



E Tr 



E Tr [ct 2 C/C/^ + g{y)g*{y)U- + 2a 2 UZU~ 



and recall that, by formula f)8. 1 [) . one has Tr[C7C7 ] = n. Validity of formula (j3.7|) follows from the 
fact that, in the case of a tight frame, one has U~U = a~ 2 U 2 . 



Proof of Corollary [21 Validity follows directly from Theorem [2] and relations (I3.8P and (13.9 
Proof of Theorem [3l Note that 

E||/-/|| 2 = 



[Y, E||/ - /|| 2 = £ AiE||/ - /W + /« - /|| 2 

\i=l / i=l 
m 

£ Aj [e||/ - /« || 2 + E||/« - /|| 2 + 2E (/ - /«) * (/« - / 



i=l 



By direct calculations, it is easy to check that 



i=l 



i=i 



?(*)||3 



5>ii/ w -/ii s 
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Therefore, changing the order of expectation and summation (due to the fact that the weights are 
data-independent) and using identity above, we derive 



E|| / - /|| 2 = E £ A, - /|| 2 + ll/W - /|| 2 - 2||/W - /|f 
i=i 

which completes the proof. 



Proof of Theorem [4]. Applying Theorem [2] with a = m and a block vector g with blocks given 
by formula (|5.7p . one can write A as 

A = m- 2 g*(y) U g(y) + 2a 2 m~ 1 Tr[f/Z] = Ai + A 2 . (8.3) 

Here, re-arranging g®(y), we obtain 

m m 

Ai = X i(y) X i(y)(9 {{) yU {i > j) 9 U) + m" 2 £ [1 - mXiiyW - m\ 3 {y)]{y^fU^y^ 

i,j=l i,j=l 
m 

- 2m- 1 - roA^y)]^)*^"^- 

Since U^y® = W^(W^)*y^ = W®x = y (i) and E^LiC 1 - ™ A j) = °> the second and the 
third terms in the last expression are equal to zero and 

m 

Ai = J^kmMi^yU^g^. (8.4) 

Now, consider A 2 . Note that Z is a block matrix with blocks which, with the help of 

the product rule, can be presented in the form 

= V y ^(~9 ij) y(y) = m[V y{l) \ 3 {yW {3) y + mXi{y)[I n + = j) - I n I(i = j). 

Substituting the last expression into A 2 in (|8.3p and recalling that [/(*>*) = I n , we arrive at 

m m 

A 2 = 2a 2 m- 1 Tr[U^Z^] = 2a 2 Tr [l^ [V„«) Ajfo/)] (0^)*] (8.5) 

i,jr' = l i,jr' = l 



+ 2^ 2 My) Tr I J n + z(i,i) ] - 2 ^ n - 

i=l 

Now, interchange i and j in the first term of A 2 , and also note that 

m m 

£>(y) Tr[I n + Z^] = n + Y J \{y) Tr[Z^]. 

i=l i=l 

To complete the proof, combine (|8.3|) . (|8.4p and (|8.5D . 

Proof of Lemma [JJ Observe that V y [g*(y)g(y)] is a column vector with components 

^[g*(y)g(y)]=2±^lg k (y). 
oyi dyi 
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Hence, V ' y {g*{y)g{y)} = 2[S7 y g*(y)]g(y). Similarly, V y Tr [V y g*(y)] is a column vector with compo- 
nents 



dk(y) = -S- 
dyk 



.1=1 



dgijy) 
dyi 



which coincides with (|5.10p . 
Proof of Corollary [3], Denote 



l=i 

so that, log(A i (y)) = logfa) - f3 Vl (y^) - log(¥(y)). Then, V yU My) = Hv) V^) [log^y))] 
where 

m 

V^pogCAify))] = -13 V^Mv^+P EA,(y)V tfW )N(l/ (,) )]. 

z=i 

Taking into account that V y Q) [Vi{y^)] = if i ^ j, we derive 

V„ tf ) My) = p\i{y) [\,(y)V y u) fo(y u) )\ - V y U) h(y {l) )W = J) 
Now, to complete the proof of (|5.13|) . recall from Theorem H] that 

m 

A = 2a 2 [V w «)Ai(y)]* 



and insert the expression for V^.,) [Aj(y)] into Aq. To show validity of (|5.12p . note that 

m 

K(y)Xj(y) W y u)Vj(y (j) )Y §® = 



J7I 



J'=l 
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